Generalize table storage and scans#614
Conversation
|
@adsharma could you PTAL? |
|
failing test is unrelated: |
|
This refactor has been carried out based on the assumption that any new external table formats will strictly conform to |
|
Flaky github infra. Rerunning makes the tests pass. |
|
How about On
|
|
ColumnarTables use NodeTable's functions which expect
The motivation behind returning a
Node and rel tables are meant to be different. Moreover we don't have a common
could be done as part of #614 (comment)
It doesn't modify members of the class 🤔 |
|
The larger question is: vs We arrived at the current design with the idea that But you bring up a good point about breaking ABI. Is it the extension ABI you're talking about or something else? |
I will get back to it by tmrw |
Ideally it should be the 2nd one. But before designing top interfaces i.e., PS: Not sure why
100% agree. The idea is to add virtual empty functions and override them only in non-native tables
if the extensions are the only ones consuming the ladybug ABI, then yes. Since we build and release the core and extensions together, I believe the rist is low 🤔 . The only issue is loading extensions built against old ladybug code |
Arrow is special in that it doesn't have the concept of a Node/Row group and we're externally imposing it. It's a memory based format that provides random access vs other formats being block based.
This is something we don't support today, but it's a good goal to shoot for. There is a period after the release (such as 0.17.1) and the next release (0.18.x) when you can't use published extensions if ABI breaking changes went in. Sounds like we have consensus that the second design is preferable, but the cost of doing it now may be high? Also Lance may be different from other parquet like formats in aspiring to be a single file database (with indexes, incremental updates etc) as opposed to being a generic/simpler read-only format like parquet. In that sense it might be closer to ladybug native tables, not parquet. |
Yeah. Refactor would be huge, but we will be risking just the
Yeah. However, right now every table has its own folder containing both its data and metadata. Not sure about the future. That's where lance catalog catalog can be used Should we consult lance folks? |
|
I'm thinking all of this is post 0.18.x release. Lance catalog: we should distinguish the two use cases:
Would be great to get review from Lance folks if possible. |
I don't think it's possible to get a review from them. They hardly respond on lance repos 🙃 . I can ask them about |
|
deepwiki says you can |
|
No reply from lance maintainers. I think it's better follow lance-graph: support dataset URI and namespaces |
arrowTable::getNumBatches